Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JIT: Fix emitter::emitSplit to not create zero-sized method fragment #107568

Merged
merged 3 commits into from
Sep 20, 2024

Conversation

amanasifkhalid
Copy link
Member

Fixes #106379 (comment). When considering the final split of a method body, skip the split if it will create a zero-sized fragment at the end of the method. Under stress modes, it's fine to ignore the desired fragment size, but in the rare case where we need to split the last piece of the method and it is exactly 512KB, skipping the split might break ISA invariants around how EH data is reported. This seems terribly unlikely, but I documented this just in case we hit it some day.

@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Sep 9, 2024
@amanasifkhalid
Copy link
Member Author

cc @dotnet/jit-contrib, @BruceForstall could you PTAL? Thanks!

@amanasifkhalid amanasifkhalid requested a review from a team September 10, 2024 20:48
@BruceForstall
Copy link
Member

I'll look soon.

However, regarding "This seems terribly unlikely" -- that is not a satisfying reason to not ensure this works 100% of the time. I've worked on a lot of software where "terribly unlikely" == "guaranteed to happen".

@amanasifkhalid
Copy link
Member Author

amanasifkhalid commented Sep 10, 2024

However, regarding "This seems terribly unlikely" -- that is not a satisfying reason to not ensure this works 100% of the time. I've worked on a lot of software where "terribly unlikely" == "guaranteed to happen".

Agreed; you inspired me to do a quick audit of the coreclr_tests collection, and we do have a handful of methods that have to be split up on ARM32/ARM64/RISC-V/etc. to report their EH info correctly (the largest IL size comes in at 7,557,059 bytes), though none of them happened to be susceptible to creating a zero-size fragment. Instead of trying to add padding this late in the emitter, I think we can instead shorten the candidate size by the size of an instruction if it is equal to the remaining method size. This way, the last fragment will have a non-zero size, and all the fragments will be guaranteed to fit in the unwind data.

Edit: I thought more about this, and doing something like moving the last instruction in the last candidate IG to a new IG to create a non-zero fragment size seems nontrivial. Between the two, it seems safer to give the next IG a non-zero size with padding, though before we pursue that, it would be helpful to verify if having unwind data corresponding to a zero-size code fragment is a problem on platforms that split fragments.

@BruceForstall
Copy link
Member

@amanasifkhalid I'm still trying to understand how the problem could occur. It seems like we are processing IGs in sequence, and if we collect enough that requires splitting, we will do so before reaching the last one. If we reach the last one, and call splitIfNecessary() outside the loop, it will only split if we need a potential split between the last IG in the list and its previous IG. That would imply that the last IG has non-zero size, otherwise we would have split before it.

Where does the zero sized IG come from?

Do you have a SPMI MCH I can use to repro?

@amanasifkhalid
Copy link
Member Author

Where does the zero sized IG come from?

The zero-sized IG seems to be specific to JitStress. From the JIT dump:

G_M11858_IG08:        ; offs=0x000098, size=0x0000, bbWeight=1, gcVars=0000000000000008 {V01}, gcrefRegs=0000 {}, byrefRegs=0000 {}, BB05 [0008], gcvars, byref, isz
                             ; byrRegs -[x3]


*** JitStress: STRESS_EMITTER ***

IN001b: 000098      align   [0 bytes for IG09]
IN001c: 000098      align   [0 bytes]
IN001d: 000098      align   [0 bytes]
IN001e: 000098      align   [0 bytes]
						;; size=0 bbWeight=1 PerfScore 0.00
G_M11858_IG09:        ; offs=0x000098, size=0x000C, bbWeight=9.00, gcVars=0000000000000009 {V00 V01}, gcrefRegs=0000 {}, byrefRegs=0000 {}, gcvars, byref, funclet prolog, nogc

This seems to match the scenario described in this comment in emitter::emitSplit:

// Don't report a zero-size candidate. This will only occur in a stress mode with JitSplitFunctionSize
// set to something small, and a zero-sized IG (possibly inserted for use by the alignment code). Normally,
// the split size will be much larger than the maximum size of an instruction group. The invariant we want
// to maintain is that each fragment contains a non-zero amount of code.
if (candidateSize == 0)

So I'm guessing this scenario is only possible under stress modes? We shouldn't normally see zero-sized IGs, right?

Do you have a SPMI MCH I can use to repro?

Sure. I'm not getting a good collection using an AltJit, so I'm going to rebuild my arm64 runtime; just a minute...

@amanasifkhalid
Copy link
Member Author

@BruceForstall if you have an arm64 machine to try this on, Method4 hits asserts with DOTNET_JitStress=1:

using System;
using System.Collections.Generic;
using System.Runtime.CompilerServices;
using System.Runtime.Intrinsics;
using System.Runtime.Intrinsics.Arm;
using System.Runtime.Intrinsics.X86;
using System.Numerics;
public class TestClass
{
    public struct S1
    {
        public struct S1_D1_F1
        {
            public int int_0;
        }
        public sbyte sbyte_1;
    }
    public struct S2
    {
        public Vector<uint> v_uint_2;
    }
    static Vector64<short> s_v64_short_20 = Vector64.Create(((short)(4)), -5, 4, 2);
    static Vector64<int> s_v64_int_22 = Vector64<int>.AllBitsSet;
    static Vector64<long> s_v64_long_24 = Vector64<long>.AllBitsSet;
    static Vector128<int> s_v128_int_32 = Vector128.CreateScalar(((int)(2)));
    static S1 s_s1_52 = new S1();
    static S2 s_s2_53 = new S2();
    Vector64<short> v64_short_71 = Vector64.Create(((short)(4)), -2, -1, 4);
    Vector64<long> v64_long_75 = Vector64<long>.Zero;
    Vector128<int> v128_int_83 = Vector128.Create(((int)(0)), 2, 4, -1);
    S1 s1_103 = new S1();
    S2 s2_104 = new S2();
    static int s_loopInvariant = 8;
    public S2 Method4(out S1 p_s1_198, S2 p_s2_199, ref S2 p_s2_200, out S1 p_s1_201, ref S1 p_s1_202, S2 p_s2_203, S2 p_s2_204, out S1 p_s1_205)
    {
        unchecked
        {
            p_s1_198 = s1_103;
            p_s1_201 = p_s1_198;
            p_s1_205 = s1_103;
            try
            {
                s_v64_long_24 = ((Vector64<long>)(((Vector64<long>)(s_v64_long_24 = ((Vector64<long>)(v64_long_75 = v64_long_75)))) | s_v64_long_24));
            }
            finally
            {
                int __loopvar0 = s_loopInvariant, __loopSecondaryVar0_0 = s_loopInvariant - 9;
                for (;;)
                {
                    if (__loopvar0 <= s_loopInvariant + 9)
                        break;
                    try
                    {
                        v128_int_83 -= ((Vector128<int>)(((Vector128<int>)(((Vector128<int>)(s_v128_int_32 -= v128_int_83)) | ((Vector128<int>)(s_v128_int_32 | Vector128.WithUpper(s_v128_int_32, s_v64_int_22))))) | v128_int_83));
                    }
                    finally
                    {
                        v64_short_71 *= ((Vector64<short>)(s_v64_short_20 *= ((Vector64<short>)(Vector64.LessThanOrEqual(v64_short_71, s_v64_short_20) - ((Vector64<short>)(((Vector64<short>)(s_v64_short_20 * s_v64_short_20)) - ((Vector64<short>)(Vector64<short>.AllBitsSet * v64_short_71))))))));
                    }
                }
            }
            return s_s2_53;
        }
    }
    public void Method0()
    {
        unchecked
        {
            S1 s1_2942 = new S1();
            S2 s2_2943 = new S2();
            s_s2_53 = Method4(out s1_103, s_s2_53, ref s2_104, out s_s1_52, ref s1_103, s2_2943, s2_104, out s1_2942);
        }
    }
    public static void Main(string[] args)
    {
        new TestClass().Method0();
    }
}

Copy link
Member

@BruceForstall BruceForstall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if some more verbose comments about the possible scenario, as described here would be helpful.

E.g., why is this only a problem for the last call to splitIfNecessary and not the ones in between? I think that's because any zero-sized IG that aren't the last will get "absorbed" into following non-zero-sized IGs.

Comment on lines 3076 to 3079
// TODO: This scenario may be possible (but very unlikely) when using the default fragment size.
// Fixing this would likely require padding out the last fragment to not be zero-sized, and doing the split.
// For now, assume this only occurs under stress modes, or when using DOTNET_JitSplitFunctionSize.
assert(maxSplitSize != UW_MAX_FRAGMENT_SIZE_BYTES);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this comment is quite right. Yes, it is unlikely. But we're talking about a single empty fragment at the end. If there was a non-empty fragment, then curSize != candidateSize. So it seems like the fix works for any split size, including the default, and doesn't require this assert or TODO. (It might require other commenting/documentation, but that's a separate question.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But we're talking about a single empty fragment at the end.

Yeah, on second thought, I think you're right. If we have a 512KB-sized IG, followed by an empty IG that happens to be zero-sized, then the last fragment should be exactly 512KB in size, which shouldn't break any unwind data invariants. I'll remove the TODO.

@amanasifkhalid
Copy link
Member Author

I wonder if some more verbose comments about the possible scenario, as described #106379 (comment) would be helpful.

Thanks for digging into this example; I've added some comments describing the scenarios that could hit the fix.

@amanasifkhalid amanasifkhalid merged commit 8756775 into dotnet:main Sep 20, 2024
108 checks passed
@amanasifkhalid amanasifkhalid deleted the fix-emit-split branch September 20, 2024 03:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

JIT: Assertion failed 'endOffset > startOffset' during 'Emit GC+EH tables'
2 participants